Pointer-size controlled instruction processing

ABSTRACT

Instruction set architectures (ISAs) and apparatus and methods related thereto comprise a variable length instruction set that includes one or more pointer-size controlled memory access instructions of a smaller length (e.g. 16 bits) wherein the size of the data accessed by such an instruction is dynamically determined based on the size of the pointer. Specifically, when a pointer-size controlled memory access instruction is received at a decode unit, the decode unit outputs one or more control signals to cause an execution unit to perform a memory access of a first size (e.g. 32 bits) when the pointer size is the first size (e.g. 32 bits), and output one or more control signals to cause the execution unit to perform a memory access of a second size (e.g. 64 bits) when the pointer size is the second size (e.g. 64 bits).

RELATED APPLICATIONS

This application claims the benefit of U.S. provisional patent applications “Implicit Global Pointer Relative Addressing for Global Memory Access” Ser. No. 62/552,855, filed Aug. 31, 2017, “Unified Logic” Ser. No. 62/552,796, filed Aug. 31, 2017, “Pointer-Size Controlled Instruction Processing” App. No. 62/552,841, filed Aug. 31, 2017, “Saving and Restoring Non-Contiguous Blocks of Preserved Registers” Ser. No. 62/552,830, filed Aug. 31, 2017, and “Unaligned Memory Accesses” Ser. No. 62/558,930, filed Sep. 15, 2017.

Each of the foregoing applications is hereby incorporated by reference in its entirety.

FIELD OF ART

This application relates generally to data manipulation and more particularly to reconfigurable fabric data routing.

BACKGROUND

People regularly interact with a wide variety of electronic systems. Common electronic systems include computers, smartphones, and tablet computers, while other electronic systems now appear in many familiar items, ranging from household appliances to vehicles. These electronic systems include integrated circuits or “chips” which, depending on the system in which the chips are used, can range from simple to highly complex. The chips are designed to perform a wide variety of system functions, and enable the systems to perform their functions effectively and efficiently. The chips are built using highly complex circuit designs, architectures, and system implementations. The chips are, quite simply, integral to the electronic systems. The chips are designed to implement system functions such as user interfaces, communications, processing, and networking. These system functions are applied to electronic systems used for business, entertainment, or consumer electronics purposes. The electronic systems routinely contain more than one chip. The chips implement critical system functions including computation, storage, and control. The chips support the electronic systems by computing algorithms and heuristics, handling and processing data, communicating internally and externally to the electronic system, and so on. Since the numbers of computations and other functions that must be performed are large, any improvements in chip efficiency contribute to a significant and substantial impact on overall system performance. As the amount of data to be handled increases, the approaches that are used must not only be effective, efficient, and economical, but must also scale as the amount of data increases.

Single processor architectures based on chips are well-suited for some computational tasks, but are unable to achieve the high performance levels required by some high-performance systems. Multiple single processors can be used together to boost performance. Parallel processing based on general-purpose processors can attain an increased level of performance, thus parallelism is one approach for achieving increased performance. There is a wide variety of applications that demand high performance levels. Common applications requiring high performance include networking, image and signal processing, and large simulations, to name but a few. In addition to computing power, chip and system flexibility are important for adapting to ever-changing computational needs and technical situations.

System or chip reconfigurability is another approach that can address application demands. The system or chip attribute of reconfigurability is critical to many processing applications, as reconfigurable devices are extremely efficient for specific processing tasks. In certain circumstances, the cost and performance advantages of reconfigurable devices exist because the reconfigurable or adaptable logic enables program parallelism, which allows multiple computation operations to occur simultaneously. By comparison, conventional processors are often limited by instruction bandwidth and execution rate restrictions. Note that the high-density properties of reconfigurable devices can come at the expense of the high-diversity property that is inherent in other electronic systems, including microprocessors. Microprocessors have evolved to highly-optimized configurations that provide cost/performance advantages over reconfigurable systems for tasks that require high functional diversity. However, there are many tasks for which a conventional microprocessor is not the best design choice. A system architecture that supports configurable, interconnected processing elements can be an excellent alternative for many data-intensive applications such as Big Data.

SUMMARY

An instruction set architecture (ISA) defines the programmer-visible components and operations of a data processing apparatus, or processor, for example, a computer or microprocessor, and it typically defines, but is not limited to, organization of the memory (e.g. address space, addressability), the register set (e.g. the number of registers, the size of registers and how they are used) and the instruction set (e.g. a set of machine instructions that are supported by a hardware implementation of the ISA). The ISA acts as the interface between software and hardware. Specifically, software written in accordance with a particular ISA (e.g. written using the instructions in the defined instruction set) can be run on any hardware implementation of that ISA (e.g. any data processing apparatus that is configured to process the instructions in the defined instruction set).

Described herein are instruction set architectures (ISAs), apparatus, and methods related thereto, that comprise a variable length instruction set that includes one or more pointer-size controlled memory access instructions of a smaller length (e.g. 16 bits), wherein the size of the data accessed by such an instruction is dynamically determined based on the current pointer size. Specifically, when a pointer-size controlled memory access instruction is received at a decode unit of a data processing apparatus, the decode unit may output one or more control signals to cause an execution unit to perform a memory access of a first size (e.g. 32 bits) when the pointer size is the first size (e.g. 32 bits), and may output one or more control signals to cause the execution unit to perform a memory access of a second size (e.g. 64 bits) when the pointer size is the second size (e.g. 64 bits). This may improve the code density of programs generated according to such ISAs as it ensures that the commonly used memory access instructions (e.g. memory access instructions that access pointer-size data) are implemented using the smaller instruction length without wasting the smaller-length instruction space with memory access instructions that are not commonly used (i.e. memory access instructions that access other sized data).

Techniques are disclosed for processor instruction manipulation. Modeless architectures are supported by a 32-bit operation and a 64-bit operation, for example. A method for processor instruction manipulation is disclosed comprising: receiving an instruction for execution by an execution unit of a processor, the received instruction being an instruction from a variable length instruction set comprising one or more instructions of a first length and one or more instructions of a second length, wherein the first length is shorter than the second length and the one or more instructions of the first length comprising one or more pointer-size controlled memory access instructions; determining that the received instruction is one of the one or more pointer-size controlled memory access instructions; determining dynamically a size of data to be accessed based on a current pointer size; and outputting one or more control signals that cause the execution unit to perform a memory access of the dynamically determined size of data.

In embodiments, dynamically determining the size of data to be accessed based on the current pointer size comprises dynamically determining the size of data to be accessed to be a first size when the current pointer size is the first size and dynamically determining the size of data to be accessed to be a second size when the current pointer size is the second size. In embodiments, the one or more instructions of the second length comprise a corresponding memory access instruction to one of the one or more pointer-size controlled memory access instructions, the corresponding memory access instruction identifying that data of the first size is to be accessed. In embodiments, the processor comprises a modeless architecture. In embodiments, an instruction function is unmodified between N-bit and 2 N-bit address spaces. In embodiments, the modeless architecture enables different addressing for code-density oriented instructions only. And in embodiments, the code-density oriented instructions are enabled for N-bit addressing and 2 N-bit addressing, based on an implicitly defined address space.

In embodiments, a computer program product embodied in a non-transitory computer readable medium for processor instruction manipulation, the computer program product comprising code which causes one or more processors to perform operations of: receiving an instruction for execution by an execution unit of a processor, the received instruction being an instruction from a variable length instruction set comprising one or more instructions of a first length and one or more instructions of a second length, wherein the first length is shorter than the second length and the one or more instructions of the first length comprising one or more pointer-size controlled memory access instructions; determining that the received instruction is one of the one or more pointer-size controlled memory access instructions; determining dynamically a size of data to be accessed based on a current pointer size; and outputting one or more control signals that cause the execution unit to perform a memory access of the dynamically determined size of data.

Various features, aspects, and advantages of various embodiments will become more apparent from the following further description.

BRIEF DESCRIPTION OF THE DRAWINGS

The following detailed description of certain embodiments may be understood by reference to the following figures wherein:

FIG. 1 is a block diagram of an example data processing apparatus configured to implement an ISA with an instruction set comprising one or more pointer-size controlled instructions.

FIG. 2 is a schematic diagram illustrating the decode of a pointer-size controlled memory access instruction when the pointer size is 32 bits.

FIG. 3 is a schematic diagram illustrating the decode of a pointer-size controlled memory access instruction when the pointer size is 64 bits.

FIG. 4 is a flow diagram of an example method of decoding instructions.

FIG. 5 is a flow diagram of an example method of determining the current pointer size.

FIG. 6 is a schematic diagram of a first example 16-bit pointer-size controlled load word/double word instruction.

FIG. 7 is a schematic diagram of an example 16-bit pointer-size controlled load word/double word stack pointer relative instruction.

FIG. 8 is a schematic diagram of an example 16-bit pointer-size controlled load word/double word global pointer relative instruction.

FIG. 9 is a schematic diagram of a second example 16-bit pointer-size controlled load word/double word instruction.

FIG. 10 is a schematic diagram of a first example 16-bit pointer-size controlled store word/double word instruction.

FIG. 11 is a schematic diagram of an example 16-bit pointer-size controlled store word/double word stack pointer relative instruction.

FIG. 12 is a schematic diagram of an example 16-bit pointer-size controlled store word/double word global pointer relative instruction.

FIG. 13 is a schematic diagram of a second example 16-bit pointer-size controlled store word/double word instruction.

FIG. 14 is a schematic diagram of an example 48-bit pointer-size controlled (double) add immediate global pointer instruction.

FIG. 15 is a schematic diagram of an example 32-bit pointer-size controlled (double) add immediate global pointer byte instruction.

FIG. 16 is a schematic diagram of an example 32-bit pointer-size controlled (double) add immediate global pointer word instruction.

FIG. 17 is a schematic diagram of an example 16-bit pointer-size controlled (double) add immediate stack pointer instruction.

FIG. 18 is a schematic diagram of a first example 32-bit pointer-size controlled (double) add immediate program counter instruction.

FIG. 19 is a schematic diagram of an example 48-bit pointer-size controlled (double) add immediate program counter instruction.

FIG. 20 is a schematic diagram of a second example 32-bit pointer-size controlled (double) add immediate program counter instruction.

FIG. 21 is a block diagram of an example integrated circuit manufacturing system for generating an integrated circuit embodying the data processing apparatus described herein.

FIG. 22 is a flow diagram for pointer-size controlled instruction processing.

FIG. 23 is a flow diagram for mode switching instructions.

FIG. 24 is a system diagram for pointer-size controlled instruction processing.

DETAILED DESCRIPTION

Instruction Set Architectures (ISAs) are generally categorized as CISC (complex instruction set computer) ISAs or RISC (reduced instruction set computer) ISAs. CISC ISAs typically have larger, more feature-rich instruction sets whereas RISC ISAs typically have smaller, simpler instruction sets. A program written in accordance with a CISC ISA is typically shorter than a program written in accordance with a RISC ISA since it often takes multiple RISC instructions to perform the same operation as one CISC instruction. However, data processing apparatus that implement RISC ISAs typically run at faster clock speeds because the clock speed is dictated by the slowest step in the pipeline and more complex instructions tend to be slower.

A RISC ISA may have a fixed length instruction set or a variable length instruction set. A fixed length instruction set comprises instructions that are all the same length (e.g. all instructions are 16 bits or all instructions are 32 bits). In contrast, a variable length instruction set comprises instructions of different lengths (e.g. some instructions may be 16 bits and other instructions may be 32 bits). Where an ISA has a variable-length instruction set, the allocation of instructions to different lengths, can significantly affect the code density of executable code generated in accordance with such an ISA.

As is known to those of skill in the art, the term “code density” describes the amount of space that the executable code for a program takes up in memory. This may also be referred to as the “memory footprint” of the program. The denser the code, the less space the code takes up in memory. Conversely, the less dense the code, the more space the code takes up in memory. Code density is particularly important when the program is to be executed by a data processing apparatus with a limited amount of memory, such as a mobile telephone or other embedded systems.

The code density of a program is based on the number of instructions required to perform each operation of the program and the number of bits per instruction. Generally, the fewer instructions it takes to perform operations and the less bits per instruction, the denser the code. Accordingly, the code density of programs generated in accordance with an ISA having a variable-length instruction set is generally better when more shorter length instructions (e.g. 16-bit instructions) are used than longer length instructions (e.g. 32-bit instructions).

As described above, the code density of programs generated in accordance with an ISA with a variable length instruction set will generally perform better if more instructions of the program are shorter length instructions (e.g. 16-bit instructions). For example, a program with ten instructions will take up 320 bits if all ten instructions are 32-bit instructions, 240 bits if half of the instructions are 32-bit instructions and half of the instructions are 16-bit instructions, and only 160 bits if all ten instructions are 16-bit instructions. Accordingly, to improve the code density of programs generated in accordance with an ISA with a variable length instruction set, it is desirable to have more commonly used instructions implemented as shorter length instructions (e.g. 16 bits) which thereby increases the number of shorter length instructions (e.g. 16-bit instructions) in the executable code representing a program.

Some of the most commonly used instructions are memory access instructions (e.g. loads or stores) that access (e.g. read or write) a pointer-size amount of data in memory. As is known to those of skill in the art, a pointer is a variable that is used to store an address in memory. The pointer size is thus dependent on the size of the addressable memory. Pointers are typically either 32 bits or 64 bits, but it will be evident to a person of skill in the art that pointers may be other sizes. The size of the addressable memory and thus the size of the pointers may be determined by the size of the physical memory, the capabilities of the data processing apparatus implementation of the ISA, the operating system (O/S), the program, or a combination thereof. For example, a 32-bit data processing apparatus may only be able to process 32-bit values and thus the pointers may be fixed at 32 bits. However, a 64-bit data processing apparatus may be able to process 64-bit values, but may allow programs running thereon to use 32-bit pointers or 64-bit pointers.

Accordingly, it is desirable for memory access instructions that access a pointer-size amount of data in memory to be implemented using shorter length instructions (e.g. 16-bit instructions). However, since the size of the pointer varies between data processing apparatuses and/or programs, which of the memory access instructions are commonly used instructions varies between data processing apparatuses and/or programs. For example, in some cases (e.g. when 32-bit pointers are used), memory access instructions that access 32 bits of memory are commonly used and memory access instructions that access 64 bits of memory are not very commonly used, and in other cases (e.g. when 64-bit pointers are used), memory access instructions that access 64 bits of memory are commonly used and memory access instructions that access 32 bits of memory are not very commonly used. Due to the limited number of instructions that can be implemented as smaller sized instructions, it may not be an efficient use of the smaller-sized instruction space (e.g. 16-bit instruction space) to have all varieties of memory access instructions that access data of all possible pointer sizes (e.g. memory access instructions that access 32 bits of memory and memory access instructions that access 64 bits of memory) implemented using smaller-sized instructions. This may “waste” some of the smaller-sized instruction space on memory access instructions that are not very commonly used.

Accordingly described herein are instruction set architectures (ISAs), and apparatus and methods related thereto, that comprise a variable length instruction set that includes one or more pointer-size controlled memory access instructions of a smaller length (e.g. 16 bits) wherein the size of the data accessed by such an instruction is dynamically determined based on the current pointer size. Specifically, when a pointer-size controlled memory access instruction is received at a decode unit, the decode unit may output one or more control signals to cause an execution unit to perform a memory access of a first size (e.g. 32 bits) when the pointer size is the first size (e.g. 32 bits), and may output one or more control signals to cause the execution unit to perform a memory access of a second size (e.g. 64 bits) when the pointer size is the second size (e.g. 64 bits). This may improve the code density of programs generated according to such ISAs as it ensures that the commonly used memory access instructions (e.g. memory access instructions that access pointer-size data) are implemented using smaller length instructions without wasting the smaller-length instruction space with memory access instructions that are not commonly used.

Processor architectures have been routinely categorized by describing either the underlying hardware architecture or microarchitecture of a given processor, or by referencing the instruction set executed by the processor. The latter, the ISA, describes the types and ranges of instructions available, rather than how the instructions are implemented in hardware. By referencing an instruction set, a given ISA can be implemented using a wide range of techniques, where the techniques can be chosen based on preference or need for execution speed, data throughput, power dissipation, and manufacturing cost, among many other criteria. The ISA serves as an interface between code that is to be executed on the processor and the hardware that implements the processor. ISAs, and the processors or computers based on them, are partitioned broadly into categories including complex instruction set computers (CISC) and reduced instruction set computers (RISC). The ISAs define types of data that can be processed; the state or states of the processor, where the state or states include the main memory and a variety of registers; and the semantics of the ISA. The semantics of the ISA typically include modes of memory addressing and memory consistency. In addition, the ISA defines the instruction set for the processor, whether there are many instructions (complex) or fewer instructions (reduced), and the model for control signals and data that are input and output. RISC architectures have many advantages over processor design because by reducing the numbers and variations of instructions, the hardware that implements the instructions can be simplified. Further, compilers, assemblers, linkers, etc., that convert the code to instructions executable by the architecture can be simplified and tuned for performance.

In order for a processor to process data, the data must be made available to the processor or process. As discussed throughout, pointers can be used to share data between and among processors, processes, etc., by providing a reference address or pointer to the data. The pointer can be provided rather than transferring the data to each processor or process that requires the data. The pointers that are used for passing data references can be local pointers known only to a given, local processor or process, or can be GPs. The GPs can be shared among multiple processors or processes. The GPs can be organized or grouped into a GP register. The registers can include general-purpose registers, floating point registers, and so on. While operating systems such as Linux™ can use a GP for position independent code (PIC), the use of the GP implies that a particular register explicitly is used to support PIC handling and execution. In contrast, the presently described RISC architecture uses instructions that implicitly reference a GP source. The GP source provides operands manipulated by the instructions. Use of instructions that implicitly use GP source operands allows bits within the instructions to be used for purposes other than explicitly referencing GP registers. The result of implicit GP source operands is that the instructions can free the bits previously used to declare the GP, and can therefore provide longer address offsets, extended register ranges, and so on.

A further capability of the presently described architecture includes support of the rotate and exchange or ROTX instruction. This instruction can support a variety of data operations such as bit reversal, bit swap, byte reversal, byte swap, shifting, striping, and so on, all within one instruction. The use of the ROTX instruction provides a computationally inexpensive technique for implementing multiple instructions within one instruction. The rotate and exchange instruction can overlay a barrel shifter or other shifter commonly available in the presently described architecture. Separately implementing these various rotate, exchange, or shift instructions would increase central processing unit (CPU) complexity because each instruction would have an impact on one or more aspects of the CPU design. By merging the various instructions into the ROTX instruction, CPU hardware that implemented the separate instructions can be combined to result in a less complex processor.

Processors commonly include a “mode” designator to indicate that the mode in which a processor is operating is based on a number of bytes, words, and so on. For some processor architecture techniques, a mode can include a 16-bit operation, a 32-bit operation, a 64-bit operation, and so on. One or more bits within an instruction can be used to indicate the mode in which a particular instruction is to be executed. In contrast, if the processor is designed to operate without mode bits within each instruction, then the mode bits within each instruction can be repurposed. The repurposed bits within the instruction can be used to implement the longer address offsets or extended register ranges described elsewhere. When an operation “mode” is still needed for a particular operation, then instructions that are code-density oriented can be added. Specific instructions can be implemented for 16-bit, 32-bit, 64-bit, etc., operations when needed, rather than implementing every instruction to include bits to define a mode, whether the mode is relevant to the instruction or not.

Storage used by processors can be organized and addressed using a variety of techniques. Typically, the storage or memory is organized as groups of bytes, words, or some other convenient size. To make storage or memory access more efficient, the access acquires as much data as reasonable with each access, thus reducing the numbers of accesses. Access to the memory is often most efficient in terms of computation or data transfer when the access is oriented or “aligned” to boundaries such as word boundaries. However, data to be processed does not always conveniently align to boundaries. For example, the operations to be performed by a processor may be byte oriented, the amount of data in memory may align to a byte boundary but not a word boundary, and so on. Accessing specific content such as a byte can require, under certain conditions and depending on the implementation of the processor, multiple read operations. To improve computational efficiency, unaligned memory access can be required. The unaligned memory access may be needed for computational if not access efficiency. A given ISA can support explicit unaligned storage or memory accesses. The general forms of the load and store instructions for the ISA can include unaligned load instructions and unaligned store instructions. The unaligned load instructions and the unaligned store instructions support a balance or trade-off between increased density of the code that is executed by a processor and reduced processor complexity. The unaligned load instructions and the unaligned store instructions can be implemented in addition to the standard load instructions and store instructions, where the latter instructions align to boundaries such as word boundaries. When an unaligned load or store is performed, the “extra” data such as bytes that can be accessed can be held temporally for potential use by a subsequent read or store instruction (e.g., data locality).

For various reasons, execution of code can be stopped at a point in time and restarted at a later point in time, after a duration of time, and so on. The stopping and restarting of code execution can result from an exception occurring, receiving a control signal such as a fire signal or done signal, detection of an interrupt signal, and so on. In order to efficiently handle save and restore operations, an ISA can include instructions and hardware specifically tuned for the save and the store operations. A save instruction can save registers, where the registers can be stored in a stack. The saved registers can include source registers. A stack pointer can be adjusted to account for the stored registers. The saving can also include storing a local stack frame, where a stack frame can include a collection of data (or registers) on a stack that is associated with an instruction, a subprogram call, a function call, etc., that caused the save operation. The restore operation can reverse the save technique. The registers that were saved by the save operation can be restored. The restored registers can include destination registers. When the registers have been restored, the restore operation can cause a jump to a return address. Code execution can continue beginning with the return address.

Reference is now made to FIG. 1 which illustrates an example data processing apparatus 100 that implements the ISA described herein that comprises an instruction set that includes one or more pointer-size controlled memory access instructions. A data processing apparatus is any device, machine, or dedicated circuit, such as, but not limited to, a processor, computer, or computer system, with processing capability such that it can execute instructions. A processor may be any kind of general-purpose or dedicated processor, such as a CPU, GPU, System-on-chip, state machine, media processor, application-specific integrated circuit (ASIC), programmable logic array, field-programmable gate array (FPGA), or the like. A computer or computer system may comprise one or more processors.

The example data processing apparatus 100 of FIG. 1 comprises a register file 102, a decode unit 104, and an execution unit 106. It will be evident to a person of skill in the art that the data processing apparatus 100 of FIG. 1 may comprise other components that are not shown, such as, but not limited to, a fetch unit and input/output interface(s).

The register file 102 comprises a plurality of general-purpose registers which can be written to, and read from, by the execution unit 106. The general-purpose registers may include, but are not limited to, a global pointer register that is configured to point to an address in memory 108 where data is stored, which may be used to access global data; and a stack pointer which is configured to point to an address in memory 108 representing the bottom of the stack used for subroutine calls.

The decode unit 104 is configured to receive computer executable instructions representing a program or subroutine, analyze each received instruction to identify one or more operations (e.g. load, store, add, subtract, jump, branch) to be performed and the operands (e.g. the data that is to be operated on or manipulated) of the operation (if any), and output one or more control signals to cause the execution unit 106 to perform the identified operation(s) using the identified operand(s). The computer executable instructions may be provided to the decode unit 104 by a fetch stage (not shown) that is configured to fetch instructions of a program or sub-routine (in program/sub-routine order) in memory 108 as indicated by a program counter (PC).

The computer executable instructions received by the decode unit 104 are based on an instruction set architecture (ISA) that comprises a variable length instruction set with at least two different length instructions. Specifically, the instruction set comprises one or more instructions of a first length and one or more instructions of a second length wherein the first length is shorter than the second length. For example, the instruction set may comprise one or more 16-bit instructions and one or more 32-bit instructions. The instruction set may also, or alternatively, comprise one or more instructions of other lengths. For example, the instruction set may also, or alternatively, comprise instructions of a third length (e.g. 48 bits) and/or a fourth length (e.g. 64 bits) and so on.

In some cases, one or more of the instructions of the first or smallest length (e.g. 16 bits) may be compressed versions of corresponding instructions of the second or larger length (e.g. 32 bits) which is referred to as a compressed instruction. A compressed instruction is an instruction in which, due to the smaller size of the instruction, one or more of the arguments or operands is presented in a compressed or abbreviated format. For example, where a register may be explicitly identified by a five-bit number in a 32-bit instruction, a register may be identified in a 16-bit instruction by a 3-bit index to a lookup table.

The one or more instructions of the first or smallest length (e.g. 16 bits) in the instruction set include one or more pointer-size controlled memory access instructions. A pointer-size controlled instruction is an instruction where the operation that is performed in response to the instruction is dependent on the pointer size. A memory access instruction is an instruction that causes a memory access (read or write) of data in memory. Examples of memory access instructions include, but are not limited to, load instructions which cause data to be read from memory, and store instructions which cause data to be written to memory. A memory access instruction typically explicitly identifies the size of data (e.g. byte, half word (2 bytes), word (4 bytes), double word (8 bytes)) to be accessed. However, the size of the data that is accessed by a pointer-size controlled memory access instruction is dynamically determined by the decode unit 104 based on the pointer size currently being used by the data processing apparatus.

Specifically, the decode unit 104 is configured to, in response to determining a received instruction from one of the one or more pointer-size controlled memory access instructions, dynamically determine the size of the data that is to be accessed based on the current pointer size, and output one or more control signals that cause the execution unit 106 to perform a memory access of the dynamically determined size. For example, the decode unit 104 may be configured to determine the size of the data to be accessed by a pointer-size controlled memory instruction to be a first size (e.g. 32 bits) when the pointer size is the first size (32 bits) as shown in FIG. 2 and to determine the size of the data to be accessed to be a second size (e.g. 64 bits) when the pointer size is the second size (e.g. 64 bits) as shown in FIG. 3.

As described above, memory access instructions which access pointer-size data in memory are very commonly used instructions and thus having the data size of one or more memory access instructions of the smallest instruction length dynamically controlled by the pointer size allows these commonly used instructions to be in the smallest length instruction set (which improves code density) without having to “waste” the smallest length instruction space on instructions that are not commonly used instructions.

The decode unit 104 may be configured to determine that a received instruction is a pointer-size controlled memory access instruction based on the bit pattern of the received instruction. For example, each of the one or more pointer-size controlled memory access instructions may have a unique recognizable bit pattern (e.g. certain bits of the instruction have a recognizable pattern) that identifies it as a pointer-size controlled memory access instruction. For example, in some cases, each pointer-size controlled memory access instruction may have a unique opcode. Example pointer-size controlled memory access instructions are described below with reference to FIGS. 6 to 13.

As described above, the pointer size currently being used by a data processing apparatus may be based on the size of the physical memory, the capabilities of the data processing apparatus (e.g. whether the data processing apparatus is a 32-bit data processing apparatus or a 64-bit data processing apparatus), the operating system (O/S) being run on the data processing apparatus, the program being run on the data processing apparatus, or a combination thereof. Accordingly, in some cases the pointer size may dynamically change based on the program being run on the data processing apparatus. Therefore, in some cases, the decode unit 104 may comprise pointer size logic 110 that is configured to determine the current pointer size. In some examples, the pointer size logic 110 may be configured to determine that the pointer size is 32 bits if the data processing apparatus does not support 64-bit addressing (e.g. if the data processing apparatus is a 32-bit processor); and, if the data processing apparatus does support 64-bit addressing (e.g. if the data processing apparatus is 64-bit or higher data processing apparatus), to determine the pointer size based on the current operating mode of the data processing apparatus. For example, in some cases the data processing apparatus may be configured to operate in one of kernel mode, supervisor mode, or user mode based on the program that is currently running on the data processing apparatus. Different pointer sizes may be used in different modes and the pointer size for a particular mode may be identified via one or more configuration settings. For example, the pointer size for kernel mode may be identified by a KX bit, the pointer size for supervisor mode may be identified by a SX bit, and the pointer size for user mode may be identified by a UX bit. An example method for determining the current pointer size is described below with reference to FIG. 5.

The execution unit 106 is configured to execute the decoded instructions received from the decode unit 104 (i.e. to perform the operations identified by the control signals received from the decode unit 104 using any operands identified by the control signals received from the decode unit 104). In some cases, the execution unit 106 may comprise one or more arithmetic logic unit (ALUs). The execution unit 106 may comprise one or more sub-units dedicated to performing certain functions. For example, the execution unit 106 may comprise a sub-unit for executing store instructions and/or a sub-unit for executing load instructions.

In some cases, the one or more pointer-size controlled memory access instructions may comprise one or more pointer-size controlled load instructions. As described above, a load instruction is an instruction that causes a read of memory to be performed. A pointer-size controlled load instruction is a load instruction in which the size of the data that is read is dynamically determined based on the current pointer size. In these cases, the decode unit 104 is configured to, in response to determining that a received instruction is a pointer-size controlled load instruction, output one or more control signals which cause the execution unit 106 to read the dynamically determined size of data from memory.

A load instruction typically identifies the memory address to be read via one or more operands. The mechanism by which the address is identified is based on the address mode supported by the ISA which is implemented by the data processing apparatus. In an ISA that uses a displacement addressing mode, addresses in memory are identified using a base register and an offset. In these cases, the one or more pointer-size controlled load instructions may include a plurality of pointer-size controlled load instructions that use different means for identifying the base register. For example, the one or more pointer-size controlled load instructions may comprise a pointer-size controlled load instruction that explicitly identifies (e.g. by number) the base register; and a pointer-size controlled load instruction that implicitly identifies (e.g. by the opcode) the base register. For example, the one or more pointer-size controlled load instructions may comprise the L [D/W] [16] instruction described with reference to FIG. 6 which explicitly identifies the base register by number, and/or the L [D/W] [4×4] instruction described with reference to FIG. 9 which explicitly identifies the base register by number; and the L [W/D] [SP] instruction described with reference to FIG. 7 which implicitly identifies the stack pointer register as the base register and/or the L [W/D] [GP16] instruction described with reference to FIG. 8 which implicitly identifies the global pointer register as the base register. Having load instructions that implicitly identify the base register allows the bits that would otherwise be used to explicitly identify the base register to be used for another purpose (e.g. to extend the offset range).

In these cases, the one or more pointer-size controlled load instructions may additionally or alternatively include a plurality of pointer-size controlled load instructions which explicitly identify the base register, but use a different number of bits to identify the base register. Specifically, one pointer-size controlled load instruction may use a first number of bits (e.g. 3 bits) to identify the base register, whereas another pointer-size controlled load instruction may use a second number of bits (e.g. 4 bits) to identify the base register. The difference in remaining available bits may result in a different number of bits being used for the offset between these examples. For example, the one or more pointer-size controlled load instructions may comprise the L [D/W] [16] described with reference to FIG. 6 which uses 3 bits to identify the base register and 4 bits to identify an offset; and/or the L [D/W] [4×4] instructions described below with reference to FIG. 9 which uses 4 bits to identify the base register and only 2 bits to identify an offset. Having different load instructions which use a different number of bits to identify the base register allows there to be a trade-off between the number of registers that can be identified/used and the range of offsets that can be calculated therefrom.

In some cases, in addition or alternatively to comprising one or more pointer-size controlled load instructions, the one or more pointer-size controlled memory access instructions may comprise one or more pointer-size controlled store instructions. As described above, a store instruction is an instruction that causes a write to memory to be performed. A pointer-size controlled store instruction is a store instruction in which the size of the data that is written to memory is dynamically determined based on the current pointer size. In these cases, the decode unit 104 is configured to, in response to determining that a received instruction is a pointer-size controlled store instruction, dynamically determine the size of data to be written to memory based on the current pointer size, and output one or more control signals which cause the execution unit 106 to write the dynamically determined size of data to memory.

A store instruction typically identifies the memory address to be written to. As described above the mechanism by which the address is identified is based on the address mode supported by the ISA implemented by the data processing apparatus. In an ISA that uses a displacement addressing mode, addresses in memory are identified using a base register and an offset. In these cases, the one or more pointer-size controlled store instructions may include a plurality of pointer-size controlled store instructions that identify the base register in different manners. For example, the one or more pointer-size controlled store instructions may comprise a pointer-size controlled store instruction that explicitly identifies (e.g. by number) the base register; and a pointer-size controlled store instruction that implicitly identifies (e.g. via the opcode) the base register. For example, the one or more pointer-size controlled store instructions may comprise the S [W/D] [16] instruction described with reference to FIG. 10 which explicitly identifies the base register by number, and/or the S [W/D] [4×4] instruction described with reference to FIG. 13 which explicitly identifies the base register by number; and the S [W/D] [SP] instruction described with reference to FIG. 11 which implicitly identifies the stack pointer register as the base register and/or the S [W/D] [GP16] instruction described with reference to FIG. 12 which implicitly identifies the global pointer register as the base register. Having store instructions that implicitly identify the base register allows the bits that would otherwise be used to explicitly identify the base register to be used for another purpose (e.g. to extend the offset range).

In these cases, the one or more pointer-size controlled store instructions may additionally or alternatively include a plurality of store instructions which explicitly identify the base register but use a different number of bits to identify the base register. Specifically, one pointer-size controlled store instruction may use a first number of bits (e.g. 3 bits) to identify the base register whereas another pointer-size controlled store instruction may use a second number of bits (e.g. 4 bits) to identify the base register. As described above, the different number of bits used to identify the base register may result in the different instructions having a different number of offset bits. For example, the one or more pointer-size controlled store instructions may comprise the S [W/D] [16] instruction described with reference to FIG. 10 which uses 3 bits to identify the base register and 4 bits to identify an offset, and the S [W/D] [4×4] instruction described with reference to FIG. 13 which uses 4 bits to identify the base register and only 2 bits to identify an offset. Having different store instructions which use a different number of bits to identify the base register allows a trade-off between the number of registers that can be identified/used and the range of offsets that can be calculated therefrom.

In some cases, where addresses in memory are identified using a base register and an offset, in addition to the decode unit 104 being configured to dynamically determine the size of data to be accessed, the decode unit 104 may also be configured to, in response to determining that the received instruction is one of the one or more pointer-size controlled memory access instructions, dynamically determine a unit (e.g. words or double words) of the offset based on the current pointer size. For example, as described below in reference to FIGS. 6 to 13, the decode unit 104 may be configured to set the last two bits of the offset to zero so that the offset is in words when the current pointer size is 32 bits; and set the last three bits of the offset to zero so that the offset is specified in double words when the current pointer size is 64 bits. This allows the offset range to vary based on the pointer-size. For example, where a pointer-size controlled memory access instruction has three offset bits, this allows a 5-bit offset (the three specified bits plus the two zero bits) when the current pointer size is 32 bits and a 6-bit offset (the three specified bits plus the three zero bits) when the current pointer size is 64 bits.

In some cases, the one or more instructions of the second/longer length (e.g. 32 bits) may comprise one or more memory access instructions that correspond to one of the pointer-size controlled memory access instructions of the smallest length (e.g. 16 bits), but identify the size of the data to be accessed as the first size (e.g. 32 bits) or the second size (e.g. 64 bits) independent of the pointer-size. A longer length (e.g. 32 bits) memory access instruction is said to correspond to a pointer-size controlled memory access instruction if the longer length memory access instruction and the pointer-size controlled memory access instruction can cause the execution unit to perform the same operation. For example, a 16-bit pointer-size controlled load instruction that implicitly identifies a particular base register corresponds to both a 32-bit load word instruction that implicitly identifies the particular base register and a 32-bit load word instruction in which the particular base register is explicitly identified, as all three instructions cause a load from an identified address (base address+offset) in memory relative to the particular base register. In these cases, in response to determining that the received instruction is an instruction that corresponds to a pointer-size controlled memory access instruction, the decode unit 104 is configured to output one or more control signals to perform a memory access of the identified size.

Having an instruction set that has both a pointer-size controlled memory access instruction (an instruction in which the size of the data accessed is dynamically selected based on the pointer size to be one of at least a first size (e.g. 32 bits) and a second size (e.g. 64 bits)); and a corresponding memory access instruction that identifies the size of the data to be accessed as the first size (e.g. 32 bits), allows a memory access of the first size to be performed via the corresponding instruction even when the pointer size is such that the pointer-size controlled memory access will cause a data access of another size (e.g. 64 bits) to be performed. For example, the instruction set may comprise a pointer-size controlled load instruction which will cause a load of 32 bits from memory to be performed when the pointer size is 32 bits and will cause a load of 64 bits from memory to be performed when the pointer size is 64 bits; and a load instruction which causes a load of 32 bits from memory to be performed regardless of the pointer size. In this example, the second instruction allows a load of 32 bits to be performed even when the pointer size is 64 bits.

In some cases, the one or more instructions of the first/shorter length or the one or more instructions of the second/longer length may comprise one or more pointer-size controlled arithmetic instructions. An arithmetic instruction is an instruction that causes an arithmetic operation to be performed on data of a particular size. A pointer-size controlled arithmetic instruction is an instruction in which the size of the data on which the arithmetic operation is performed is dynamically determined based on the current pointer size.

In these cases, the decode unit 104 is configured to, in response to determining that a received instruction is a pointer-size controlled arithmetic instruction, dynamically determine the size of the data on which the arithmetic operation is to be performed based on the current pointer size, and output one or more control signals to cause the execution unit to perform an arithmetic operation on data of the dynamically determined size. In some cases, the decode unit 104 may be configured to dynamically determine the size of the data on which the arithmetic operation is to be performed to be the current pointer size. Specifically, the decode unit 104 may be configured to determine the size of the data on which the arithmetic operation is to be performed to be the first size (e.g. 32 bits) when the current pointer size is the first size (32 bits) (i.e. when the pointers are currently configured to be the first size) and determine the size of the data on which the arithmetic operation is to be performed to be the second size (e.g. 64 bits) when the current pointer size is the second size (e.g. 64 bits). Thus in embodiments, the one or more pointer-size controlled arithmetic instructions comprise at least one pointer-size controlled arithmetic instruction wherein the data upon which an arithmetic operation associated with the arithmetic instruction is performed is a pointer.

The one or more pointer-size controlled arithmetic instructions may comprise one or more pointer-size controlled add immediate instructions. An add immediate instruction adds an immediate value or constant to data of a particular size. A pointer-size controlled add immediate instruction is an instruction that adds an immediate or constant to data wherein the size of the data is dynamically determined based on the current pointer size. In some cases, the one or more pointer-size controlled add immediate instructions comprises one or more instructions that causes an immediate to be added to a pointer register, such as the global address pointer, the stack pointer, and the program counter. Generally, add immediate instructions explicitly identify the size of the data to which the immediate is to be added. However, since the decode unit knows the size of the pointers, where an add immediate instruction adds data to a pointer, the instruction no longer needs to identify the size of the data--the decode unit 104 can automatically select the correct-sized data (i.e. the pointer size). Thus, a pointer-size controlled add immediate instruction that adds data to a pointer can free up space in the larger instruction space (e.g. 32-bit instruction space) because it can be used to replace multiple add immediate instructions that each specify a different data size. For example, a pointer-size controlled add immediate global pointer instruction may replace both an add immediate global pointer instruction that adds an immediate to the global pointer to generate a 32-bit value and a double add immediate global pointer instruction that adds an immediate to the global pointer to generate a 64-bit value. Example pointer-size controlled add immediate instructions that add an immediate to a pointer are described below with reference to FIGS. 14 to 20.

In some cases, the one or more pointer-size controlled add immediate instructions comprise at least two pointer-size controlled add immediate instructions that identify a different size immediate to be added to the data. For example, the one or more pointer-size controlled add immediate instructions may comprise a pointer-size controlled add immediate instruction that adds an immediate of a first size (e.g. 32 bits) to the data, and another pointer-size controlled add immediate instruction that adds an immediate of a second size (e.g. 21 bits) to the data. For example, the one or more pointer-size controlled add immediate instructions may comprise the D [ADDIU] [48] instruction described below with reference to FIG. 14 which adds a 32-bit immediate to a specified source register, and the [D] ADDIU [GP.W] instruction described below with reference to FIG. 16 which adds a 21-bit immediate to a specified source register.

In some cases, the one or more pointer-size controlled add immediate instructions comprise two or more pointer-size controlled add immediate instructions that identify the immediate in different units. For example, the one or more pointer-size controlled add immediate instructions may comprise a pointer-size controlled add immediate instruction that specifies the immediate in bytes (e.g. see, for example) and another pointer-size controlled add immediate instruction that specifies the immediate in words. For example, the one or more pointer-size controlled add immediate instructions may comprise the [D] ADDIU [GP.B] instruction described below with reference to FIG. 15 which specifies the offset in bytes, and the [D] ADDIU [GP.W] instruction described below with reference to FIG. 16.

Reference is now made to FIG. 4, which illustrates an example method 400 of decoding instructions at a data processing apparatus, such as the data processing apparatus 100 of FIG. 1 that implements an ISA that includes an instruction set with one or more pointer-size controlled memory access instructions of a first length (e.g. 16 bits). The method 400 begins at block 402 where the decode unit 104 receives an instruction for execution. As described above, the decode unit 104 may receive the instruction from a fetch unit which is configured to fetch instructions of a program in program order via the program counter. Once the decode unit 104 has received an instruction, the method 400 proceeds to block 404.

At block 404 the decode unit 104 decodes the received instruction. Decoding the instruction may comprise identifying the operation to be performed by the instruction and identifying the operands thereof. The decode unit may be configured to decode the received instruction by identifying a predetermined pattern of bits in the instruction from a plurality of predetermined patterns. For example, each different instruction may be identified by a unique pattern of bits in the instruction. Once the instruction has been decoded, the method 400 proceeds to block 406.

At block 406, the decode unit determines whether the received instruction is a memory access instruction. If the decode unit determines that the received instruction is a memory access instruction, then method 400 proceeds to block 408. If, however, the decode unit determines that the received instruction is not a memory access instruction, then the method 400 may end or the method 400 may proceed to block 414. At block 408 the decode unit determines whether the received instruction is a pointer-size controlled instruction. If the decode unit determines that the received instruction is not pointer-size controlled, then the memory access instruction itself identifies the size of the memory access to be performed and the method proceeds to block 412. If, however, the decode unit determines that the received instruction is a pointer-size controlled instruction, then the method proceeds to block 410 where the decode unit dynamically determines or identifies the size of the memory access to be performed based on the current pointer size. In some cases, the decode unit may be configured to determine that the size of the memory to be accessed is a first size (e.g. 32 bits) when the current pointer size is the first size (e.g. 32 bits) (i.e. when the pointers are currently configured to be the first size) and a second size (e.g. 64 bits) when the current pointer size is the second size (e.g. 64 bits). Once the size of memory to be accessed has been identified/determined, then the method 400 proceeds to block 412 where the decode unit outputs one or more control signals which causes the execution unit to perform a memory access of the identified size.

At block 414, the decode unit determines if the received instruction is an arithmetic instruction. If the decode unit determines that the received instruction is an arithmetic instruction, then the method proceeds to block 416 where the decode unit determines if the received instruction is a pointer-size controlled instruction. If the decode unit determines that the received instruction is not a pointer-size controlled instruction, then the method proceeds directly to block 420. If, however, the decode unit determines that the received instruction is a pointer-size controlled instruction, then the method proceeds to block 418 where the decode unit dynamically determines or identifies the size of the data on which the arithmetic operation is to be performed based on the current pointer size. For example, the decode unit may be configured to determine that the data on which the arithmetic operation is to be performed is a first size (e.g. 32 bits) when the current pointer size is the first size (e.g. 32 bits) (i.e. when the pointers are currently configured to be the first size) and that the data on which the arithmetic operation is to be performed is a second size (e.g. 64 bits) when the current pointer size is the second size (e.g. 64 bits). Once the data size has been determined or identified, the method 400 proceeds to block 420 where the decode unit outputs one or more control signals to cause the execution unit to perform an arithmetic operation wherein the data is the identified size. The method 400 then ends.

Reference is now made to FIG. 5 which illustrates an example method 500 for determining the current pointer size which may be implemented by the pointer size logic 110 of FIG. 1. The method 500 begins at block 502 where the pointer size logic 110 determines whether the data processing apparatus supports 64-bit addressing. As described above, in some cases, the pointer size logic 110 may be configured to determine that the data processing apparatus does not support 64-bit addressing if the data processing apparatus determines from one or more configuration bits or settings that the data processing apparatus is a 32-bit processor and to determine that the data processing apparatus does support 64-bit addressing if the pointer size logic determines from one or more configuration bits or settings that the data processing apparatus is a 64-bit or higher data processing apparatus. If the pointer size logic 110 determines that the data processing apparatus does not support 64-bit addressing, then the method 500 proceeds to block 504 where the pointer size is determined to be 32 bits. If, however, the pointer size logic 110 determines that the data processing apparatus does support 64-bit addressing, then the method 500 proceeds to block 506 where the pointer size logic 110 determines the size of the pointer based on the current operating mode of the data processing apparatus.

As shown in FIG. 5, determining the size of the pointer based on the current operating mode may comprise determining that the pointer size is 64 bits (at block 514) if it is determined at block 508 that the current operating mode is kernel mode and the KX configuration bit is set, or if it is determined at block 510 that the current operating mode is supervisor mode and the SX configuration bit is set, or if it is determined at block 512 that the current operating mode is user mode and the UX configuration bit is set; and determining the pointer size is 32 bits otherwise (block 516).

It will be evident to a person of skill in the art that the method 500 of FIG. 5 is an example only and the pointer size logic 110 may be configured to determine the current size of the pointers in any suitable manner (e.g. from different configuration settings or information).

Reference is now made to FIGS. 6 to 20 which illustrate example pointer-size controlled instructions. Specifically, FIGS. 6 to 13 illustrate example pointer-size controlled memory access instructions and FIGS. 14 to 20 illustrate example pointer-size controlled arithmetic instructions. The instruction set may comprise any combination of these instructions. In these examples, binary values in FIGS. 6 to 20 represent specific bit patterns which the instruction must have to be decoded by the decode unit 104 as an instance of the instruction. The remaining fields are named instruction arguments or operands. Also, fields with names ending in square parentheses such as “s[7:0]” and “s[0]” specify a particular range of bits for the named argument or operand, using a Verilog style syntax. A single argument or operand value may be split into more than one field in the instruction encoding, with the bit ranges specified by each field explicitly in this way. All non-specified bits will be set to zero (e.g. if s[32] is not specified then bit 32 will be set to zero). If no explicit bit range is specified, then the argument or operand represents the least significant bits of the value.

FIG. 6 illustrates an example 16-bit pointer-size controlled load word/double word instruction (L [W/D] [16]). This instruction causes a word (i.e. 32 bits) to be loaded from an identified memory address (base register+offset) into an identified destination register when the pointer size is 32 bits, and causes a double word (i.e. 64 bits) to be loaded from the identified memory address (base register+offset) into the identified destination register when the pointer size is 64 bits. This instruction uses six bits (bits 15-10) to identify the instruction as a L [W/D] [16] instruction, three bits (bits 9 to 7) to identify the destination register (rt3), three bits (bits 6 to 4) to identify the source register (rs3) and four bits to identify an offset (u). When the current pointer size is 32 bits, the last two bits of the offset may be set to zero so that the offset is specified in words. In contrast, when the current pointer size is 64 bits, the last three bits of the offset may be set to zero so that the offset is specified in double words. This effectively allows a 6-bit offset when the current pointer size is 32 bits and a 7-bit offset when the current pointer size is 64 bits.

FIG. 7 illustrates an example 16-bit pointer-size controlled load word/double stack-pointer relative instruction (L [W/D] [SP]). This instruction causes a word (i.e. 32 bits) to be loaded from an identified memory address (stack pointer+offset) into an identified destination register when the pointer size is 32 bits, and causes a double word (i.e. 64 bits) to be loaded from the specified memory address (stack pointer+offset) into the identified destination register when the pointer size is 64 bits. This instruction implicitly identifies the stack pointer register as the base register and thus this instruction uses six bits (bits 15-10) to identify the instruction as a L [W/D] [SP] instruction, five bits (bits 9 to 5) to identify the destination register (rt), and five bits to identify an offset (u). When the current pointer size is 32 bits, the last two bits of the offset may be set to zero so that the offset is specified in words. In contrast, when the current pointer size is 64 bits, the last three bits of the offset may be set to zero so that the offset is specified in double words. This effectively allows a 7-bit offset when the current pointer size is 32 bits and an 8-bit offset when the current pointer size is 64 bits.

FIG. 8 illustrates an example 16-bit pointer-size controlled load word/double global pointer relative instruction (L [W/D] [GP16]). This instruction causes a word (i.e. 32 bits) to be loaded from an identified memory address (global pointer+offset) into an identified destination register when the pointer size is 32 bits, and causes a double word (i.e. 64 bits) to be loaded from the specified memory address (global pointer+offset) into the identified destination register when the pointer size is 64 bits. This instruction implicitly identifies the global pointer register as the base register and thus this instruction uses six bits (bits 15-10) to identify the instruction as a L [W/D] [GP16] instruction, three bits (bits 9 to 7) to identify the destination register (rt3), and seven bits (bits 6 to 0) to identify an offset (u). When the current pointer size is 32 bits, the last two bits of the offset may be set to zero so that the offset is specified in words. In contrast, when the current pointer size is 64 bits, the last three bits of the offset may be set to zero so that the offset is specified in double words. This effectively allows a 9-bit offset when the current pointer size is 32 bits and a 10-bit offset when the current pointer size is 64 bits.

FIG. 9 illustrates a second example 16-bit pointer-size controlled load word/double instruction (L [W/D] [4×4]) which uses a different number of bits to identify the base and destination registers compared to the instruction described with reference to FIG. 6. This instruction causes a word (i.e. 32 bits) to be loaded from an identified memory address (base register+offset) into an identified destination register when the pointer size is 32 bits, and causes a double word (i.e. 64 bits) to be loaded from the identified memory address (base register+offset) into the identified destination register when the pointer size is 64 bits. This instruction uses six bits (bits 15-10) to identify the instruction as a L [W/D] [4×4] instruction, four bits (bits 9 and 7-5) to identify the destination register (rt4), four bits (bits 4 and 2 to 0) to identify the base register (rs4) and two bits (bits 8 and 3) to identify an offset (u). When the current pointer size is 32 bits, the last two bits of the offset may be set to zero so that the offset is specified in words. In contrast, when the current pointer size is 64 bits, the last three bits of the offset may be set to zero so that the offset is specified in double words. This effectively allows a 4-bit offset when the current pointer size is 32 bits and a 5-bit offset when the current pointer size is 64 bits.

FIGS. 10-13 illustrate store instructions that correspond to the load instructions of FIGS. 6 to 9. Specifically, FIG. 10 illustrates an example 16-bit pointer-size controlled store word/double word instruction (S [W/D] [16]). This instruction causes a word (i.e. 32 bits) to be stored from an identified source register to an identified memory address (base register+offset) when the pointer size is 32 bits, and causes a double word (i.e. 64 bits) to be stored from the identified source register to the identified memory address (base register+offset) when the pointer size is 64 bits. This instruction uses six bits (bits 15-10) to identify the instruction as a S [W/D] [16] instruction, three bits (bits 9 to 7) to identify the source register (rtz3), three bits (bits 6 to 4) to identify the base register (rs3) and four bits to identify an offset (u). When the current pointer size is 32 bits, the last two bits of the offset may be set to zero so that the offset is specified in words. In contrast, when the current pointer size is 64 bits, the last three bits of the offset may be set to zero so that the offset is specified in double words. This effectively allows a 6-bit offset when the current pointer size is 32 bits and a 7-bit offset when the current pointer size is 64 bits.

FIG. 11 illustrates an example 16-bit pointer-size controlled store word/double word stack pointer relative instruction (S [W/D] [SP]). This instruction causes a word (i.e. 32 bits) to be stored from an identified source register to an identified memory address (stack pointer+offset) when the pointer size is 32 bits, and causes a double word (i.e. 64 bits) to be stored from an identified source register to an identified memory address (stack pointer+offset) when the pointer size is 64 bits. This instruction implicitly identifies the stack pointer register as the base register and thus this instruction uses six bits (bits 15-10) to identify the instruction as a S [W/D] [SP] instruction, five bits (bits 9 to 5) to identify the source register (rt), and five bits to identify an offset (u). When the current pointer size is 32 bits, the last two bits of the offset may be set to zero so that the offset is specified in words. In contrast, when the current pointer size is 64 bits, the last three bits of the offset may be set to zero so that the offset is specified in double words. This effectively allows a 7-bit offset when the current pointer size is 32 bits and an 8-bit offset when the current pointer size is 64 bits.

FIG. 12 illustrates an example 16-bit pointer-size controlled load word/double word global pointer relative instruction (S [W/D] [GP16]). This instruction causes a word (i.e. 32 bits) to be stored from an identified source register to an identified memory address (global pointer+offset) when the pointer size is 32 bits, and causes a double word (i.e. 64 bits) to be stored from an identified source register to an identified memory address (global pointer+offset) when the pointer size is 64 bits. This instruction implicitly identifies the global pointer register as the base register and thus this instruction uses six bits (bits 15-10) to identify the instruction as a S [W/D] [GP16] instruction, three bits (bits 9 to 7) identify the source register (rtz3), and seven bits (bits 6 to 0) to identify an offset (u). When the current pointer size is 32 bits, the last two bits of the offset may be set to zero so that the offset is specified in words. In contrast, when the current pointer size is 64 bits, the last three bits of the offset may be set to zero so that the offset is specified in double words. This effectively allows a 9-bit offset when the current pointer size is 32 bits and a 10-bit offset when the current pointer size is 64 bits.

FIG. 13 illustrates a second example 16-bit pointer-size controlled store word/double word instruction (S [W/D] [4×4]) which uses a different number of bits to identify the base and source registers compared to the instruction described with reference to FIG. 10. This instruction causes a word (i.e. 32 bits) to be stored from an identified source register to an identified memory address (base register+offset) when the pointer size is 32 bits, and causes a double word (i.e. 64 bits) to be stored from the identified source register to the identified memory address (base register+offset) when the pointer size is 64 bits. This instruction uses six bits (bits 15-10) to identify the instruction as a S [W/D] [4×4] instruction, four bits (bits 9 and 7-5) to identify the source register (rtz4), three bits (bits 4 and 2 to 0) to identify the base register (rs4), and two bits (bits 8 and 3) to identify an offset (u). When the current pointer size is 32 bits, the last two bits of the offset may be set to zero so that the offset is specified in words. In contrast, when the current pointer size is 64 bits, the last three bits of the offset may be set to zero so that the offset is specified in double words. This effectively allows a 4-bit offset when the current pointer size is 32 bits and a 5-bit offset when the current pointer size is 64 bits.

FIGS. 14 to 20 illustrate example pointer-size controlled arithmetic instructions. FIG. 14 illustrates an example 48-bit pointer-size controlled (double) add immediate global pointer instruction ([D] ADDIU [GP48]). This instruction causes an immediate to be added to the global pointer register and stored as a 32-bit value in an identified destination register when the pointer size is 32 bits, and causes an immediate to be added to the global pointer register and stored as a 64-bit value in the identified destination register when the pointer size is 64 bits. This instruction implicitly identifies the global pointer register as the source register and thus this instruction uses 11 bits (bits 47-42 and 36-32) to identify the instruction as a [D] ADDIU [GP48] instruction, five bits (bits 41 to 37) to identify the destination register (rt), and 32 bits (bits 31 to 0) to identify a 32-bit signed immediate (s).

FIG. 15 illustrates an example 32-bit pointer-size controlled (double) add immediate global pointer byte instruction ([D] ADDIU [GP.B]). This instruction causes an immediate (specified in bytes) to be added to the global pointer register and stored as a 32-bit value in an identified destination register when the pointer size is 32 bits, and causes an immediate (specified in bytes) to be added to the global pointer register and stored as a 64-bit value in the identified destination register when the pointer size is 64 bits. This instruction implicitly identifies the global pointer register as the source register and thus this instruction uses nine bits (bits 31-26 and 20-18) to identify the instruction as a [D] ADDIU [GP.B] instruction, five bits (bits 25 to 21) to identify the destination register (rt), and 18 bits (bits 17 to 0) to identify an 18-bit immediate (u).

FIG. 16 illustrates an example 32-bit pointer-size controlled (double) add immediate global pointer word instruction ([D] ADDIU [GP.W]). This instruction causes an immediate (specified in words) to be added to the global pointer register and stored as a 32-bit value in an identified destination register when the pointer size is 32 bits, and causes an immediate (specified in words) to be added to the global pointer register and stored as a 64-bit value in the identified destination register when the pointer size is 64 bits. This instruction uses eight bits (bits 31-26 and 1-0) to identify the instruction as a [D] ADDIU [GP.W] instruction, five bits (bits 25 to 21) to identify the destination register (rt), and nineteen bits (bits 20 to 2) to identify a 21-bit immediate (the last two bits of the offset are set to zero).

FIG. 17 illustrates an example 16-bit pointer-size controlled (double) add immediate stack pointer instruction ([D] ADDIU [R1.SP]). This instruction causes an immediate to be added to the stack pointer register and stored as a 32-bit value in an identified destination register when the pointer size is 32 bits, and causes an immediate to be added to the stack pointer register and stored as a 64-bit value in the identified destination register when the pointer size is 64 bits. This instruction implicitly identifies the stack pointer register as the source register and thus this instruction uses seven bits (bits 15-10 and 6) to identify the instruction as a [D] ADDIU [R1.SP] instruction, three bits (bits 9 to 7) to identify the destination register (rt3), and six bits (bits 5 to 0) to identify an 8-bit immediate (u) (the last two bits of the offset are set to zero).

FIG. 18 illustrates an example 32-bit pointer-size controlled (double) add immediate program counter instruction ([D] ADDIUPC [32]). This instruction causes an immediate to be added to the pointer counter register and stored as a 32-bit value in an identified destination register when the pointer size is 32 bits, and causes an immediate to be added to the program counter register and stored as a 64-bit value in an identified destination register when the pointer size is 64 bits. This instruction implicitly identifies the stack program counter as the source register and thus this instruction uses six bits (bits 31-26) to identify the instruction as a [D] ADDIUPC [32] instruction, five bits (bits 25 to 21) to identify the destination register (rt), and twenty-one bits (bits 20 to 0) to identify a 22-bit signed immediate (s) (the last bit is set to zero).

FIG. 19 illustrates an example 48-bit pointer-size controlled (double) add immediate program counter instruction ([D] ADDIUPC [48]). This instruction causes an immediate to be added to the pointer counter register and stored as a 32-bit value in an identified destination register when the pointer size is 32 bits, and causes an immediate to be added to the program counter register and stored as a 64-bit value in the identified destination register when the pointer size is 64 bits. This instruction implicitly identifies the program counter register as the source register and thus this instruction uses eleven bits (bits 47-42 and 36-32) to identify the instruction as a [D] ADDIUPC [48] instruction, five bits (bits 41 to 37) to identify the destination register (rt), and thirty-two bits (bits 31 to 0) to identify a 32-bit signed immediate (s).

FIG. 20 illustrates a second example 32-bit pointer-size controlled (double) add immediate program counter instruction ([D] ALUIPC). This instruction causes an aligned address at an upper 20-bit immediate offset from the program counter to be calculated and stored as a 32-bit value in an identified destination register (rt) when the pointer size is 32 bits, and causes an aligned address at an upper 20-bit immediate offset from the program counter to be calculated and stored as a 64-bit value in an identified destination register (rt) when the pointer size is 64-bit. This instruction uses seven bits (bits 31-26 and 1) to identify the instruction as a [D] ALUIPC instruction, five bits (bits 25 to 21) to identify the destination register (rt), and twenty bits (bits 22-2 and 0) to identify a 20-bit immediate (s).

The data processing apparatus of FIG. 1 is shown as comprising a number of functional blocks. This is schematic only and is not intended to define a strict division between different logic elements of such entities. Each functional block may be provided in any suitable manner. It is to be understood that intermediate values described herein as being formed by the data processing apparatus need not be physically generated by the data processing apparatus at any point and may merely represent logical values which conveniently describe the processing performed by the data processing apparatus between its input and output.

The data processing apparatus described herein may be embodied in hardware on an integrated circuit. The data processing apparatus described herein may be configured to perform any of the methods described herein. Generally, any of the functions, methods, techniques or components described above can be implemented in software, firmware, hardware (e.g., fixed logic circuitry), or any combination thereof. The terms “module,” “functionality,” “component”, “element”, “unit”, “block”, and “logic” may be used herein to generally represent software, firmware, hardware, or any combination thereof. In the case of a software implementation, the module, functionality, component, element, unit, block, or logic represents program code that performs the specified tasks when executed on a processor. The algorithms and methods described herein could be performed by one or more processors executing code that causes the processor(s) to perform the algorithms/methods. Examples of a computer-readable storage medium include a random-access memory (RAM), read-only memory (ROM), an optical disc, flash memory, hard disk memory, and other memory devices that may use magnetic, optical, and other techniques to store instructions or other data and that can be accessed by a machine.

The terms “computer program code” and “computer readable instructions” as used herein refer to any kind of executable code for processors, including code expressed in a machine language, an interpreted language, or a scripting language. Executable code includes binary code, machine code, bytecode, code defining an integrated circuit (such as a hardware description language or netlist), and code expressed in a programming language code such as C, Java or OpenCL. Executable code may be, for example, any kind of software, firmware, script, module or library which, when suitably executed, processed, interpreted, compiled, or executed at a virtual machine or other software environment, causes a processor of the computer system at which the executable code is supported to perform the tasks specified by the code.

It is also intended to encompass software which defines a configuration of hardware as described herein, such as HDL (hardware description language) software, as is used for designing integrated circuits, or for configuring programmable chips, to carry out desired functions. That is, there may be provided a computer readable storage medium having encoded thereon computer readable program code in the form of an integrated circuit definition dataset that when processed (i.e. run) in an integrated circuit manufacturing system configures the system to manufacture a data processing apparatus configured to perform any of the methods described herein, or to manufacture a data processing apparatus comprising any apparatus described herein. An integrated circuit definition dataset may be, for example, an integrated circuit description.

Therefore, there may be provided a method of manufacturing, at an integrated circuit manufacturing system, a data processing apparatus as described herein. Furthermore, there may be provided an integrated circuit definition dataset that, when processed in an integrated circuit manufacturing system, causes the method of manufacturing a data processing apparatus to be performed.

An integrated circuit definition dataset may be in the form of computer code, for example as a netlist, as code for configuring a programmable chip, as a hardware description language defining hardware suitable for manufacture in an integrated circuit at any level, including as register transfer level (RTL) code, as high-level circuit representations such as Verilog or VHDL, and as low-level circuit representations such as OASIS (®) and GDSII. Higher level representations which logically define hardware suitable for manufacture in an integrated circuit (such as RTL) may be processed at a computer system configured for generating a manufacturing definition of an integrated circuit in the context of a software environment comprising definitions of circuit elements and rules for combining those elements in order to generate the manufacturing definition of an integrated circuit so defined by the representation. As is typically the case with software executing at a computer system so as to define a machine, one or more intermediate user steps (e.g. providing commands, variables etc.) may be required in order for a computer system configured for generating a manufacturing definition of an integrated circuit to execute code defining an integrated circuit so as to generate the manufacturing definition of that integrated circuit.

An example of processing an integrated circuit definition dataset at an integrated circuit manufacturing system so as to configure the system to manufacture a data processing apparatus will now be described with respect to FIG. 21.

FIG. 21 shows an example of an integrated circuit (IC) manufacturing system 2102 which is configured to manufacture a data processing apparatus as described in any of the examples herein. In particular, the IC manufacturing system 2102 comprises a layout processing system 2104 and an integrated circuit generation system 2106. The IC manufacturing system 2102 is configured to receive an IC definition dataset (e.g. defining a data processing apparatus as described in any of the examples herein), process the IC definition dataset, and generate an IC according to the IC definition dataset (e.g. which embodies a data processing apparatus as described in any of the examples herein). The processing of the IC definition dataset configures the IC manufacturing system 2102 to manufacture an integrated circuit embodying a data processing apparatus as described in any of the examples herein.

The layout processing system 2104 is configured to receive and process the IC definition dataset to determine a circuit layout. Methods of determining a circuit layout from an IC definition dataset are known in the art, and for example may involve synthesizing RTL code to determine a gate level representation of a circuit to be generated, e.g. in terms of logical components (e.g. NAND, NOR, AND, OR, MUX and FLIP-FLOP components). A circuit layout can be determined from the gate level representation of the circuit by determining positional information for the logical components. This may be done automatically or with user involvement in order to optimize the circuit layout. When the layout processing system 2104 has determined the circuit layout, it may output a circuit layout definition to the IC generation system 2106. A circuit layout definition may be, for example, a circuit layout description.

The IC generation system 2106 generates an IC according to the circuit layout definition, as is known in the art. For example, the IC generation system 2106 may implement a semiconductor device fabrication process to generate the IC, which may involve a multiple-step sequence of photo lithographic and chemical processing steps during which electronic circuits are gradually created on a wafer made of semiconducting material. The circuit layout definition may be in the form of a mask which can be used in a lithographic process for generating an IC according to the circuit definition. Alternatively, the circuit layout definition provided to the IC generation system 2106 may be in the form of computer-readable code which the IC generation system 2106 can use to form a suitable mask for use in generating an IC.

The different processes performed by the IC manufacturing system 2102 may be implemented all in one location, e.g. by one party. Alternatively, the IC manufacturing system 2102 may be a distributed system such that some of the processes may be performed at different locations, and may also be performed by different parties. For example, some of the stages of: (i) synthesizing RTL code representing the IC definition dataset to form a gate level representation of a circuit to be generated, (ii) generating a circuit layout based on the gate level representation, (iii) forming a mask in accordance with the circuit layout, and (iv) fabricating an integrated circuit using the mask, may be performed in different locations and/or by different parties.

In other examples, processing of the integrated circuit definition dataset at an integrated circuit manufacturing system may configure the system to manufacture a data processing apparatus without the IC definition dataset being processed so as to determine a circuit layout. For instance, an integrated circuit definition dataset may define the configuration of a reconfigurable processor, such as an FPGA, and the processing of that dataset may configure an IC manufacturing system to generate a reconfigurable processor having that defined configuration (e.g. by loading configuration data to the FPGA).

In some embodiments, an integrated circuit manufacturing definition dataset, when processed in an integrated circuit manufacturing system, may cause an integrated circuit manufacturing system to generate a device as described herein. For example, the configuration of an integrated circuit manufacturing system in the manner described above with respect to FIG. 21 by an integrated circuit manufacturing definition dataset may cause a device as described herein to be manufactured.

In some examples, an integrated circuit definition dataset could include software which runs on hardware defined at the dataset or in combination with hardware defined at the dataset. In the example shown in FIG. 21, the IC generation system may further be configured by an integrated circuit definition dataset to, on manufacturing an integrated circuit, load firmware onto that integrated circuit in accordance with program code defined at the integrated circuit definition dataset or otherwise provide program code with the integrated circuit for use with the integrated circuit.

The implementation of concepts set forth in this application in devices, apparatus, modules, and/or systems (as well as in methods implemented herein) may give rise to performance improvements when compared with known implementations. The performance improvements may include one or more of increased computational performance, reduced latency, increased throughput, and/or reduced power consumption. During manufacture of such devices, apparatus, modules, and systems (e.g. in integrated circuits), performance improvements can be traded-off against the physical implementation, thereby improving the method of manufacture. For example, a performance improvement may be traded against layout area, thereby matching the performance of a known implementation but using less silicon. This may be done, for example, by reusing functional blocks in a serialized fashion or sharing functional blocks between elements of the devices, apparatus, modules, and/or systems. Conversely, concepts set forth in this application that give rise to improvements in the physical implementation of the devices, apparatus, modules, and systems (such as reduced silicon area) may be traded for improved performance. This may be done, for example, by manufacturing multiple instances of a module within a predefined area budget.

The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being executed based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.

FIG. 22 is a flow diagram for pointer-size controlled instruction processing. The pointer-size controlled instruction processing can include decoding instructions at a data processing apparatus. The flow 2200 includes receiving an instruction for execution 2210 by an execution unit of the data processing apparatus. The execution unit of the data processing apparatus can be configured within the data processing apparatus, where the data processing apparatus can include a reconfigurable fabric. The data processing apparatus can be based on a variety of architectures. The architectures can include a control flow or Von Neumann architecture, a data flow architecture, etc. In embodiments, the architecture of the data processing apparatus can include an instruction set architecture (ISA). In embodiments, the data processing apparatus can be a reduced instruction set computer (RISC). The data processing apparatus architecture can be based on other types of instruction set computers such as a complex instruction set computer (CISC). The received instruction can be an instruction from a variable length instruction set including one or more instructions of a first length and one or more instructions of a second length, where the first length is shorter than the second length. An instruction of the first length or an instruction of the second length can include an access to a memory for loading, storing, manipulating, comparing, or analyzing, etc., a variety of data types. The data types can include bits, bytes, integers, reals, floats, characters, strings, words, double words, etc. The one or more instructions of the first length can include one or more pointer-size controlled memory access instructions. Pointers of different sizes can be used to access memory using a variety of techniques such as addressing techniques. Addressing techniques based on pointers of different sizes can include an index, reference, or offset addressing, indirect addressing, and so on. In embodiments, the pointer can be one of a global data pointer, a stack pointer, and a program counter.

The flow 2200 includes determining that the received instruction is one of the one or more pointer-size controlled memory access instructions 2220. The execution unit of a processing apparatus can be configured to execute a variety of types of codes or instructions. The instructions can be categorized broadly by the functions or operations performed by the instructions. The functions or operations can be based on data transfer instructions, data manipulation instructions, program or code control instructions, and so on. Instructions such as data transfer instructions typically include loading data and storing data.

In embodiments, the one or more pointer-size controlled memory access instructions include one or more pointer-size controlled load instructions. The load instructions can load data, instructions, etc., from memory, where the memory can be local memory, direct memory access (DMA) memory, external memory, shared memory, etc. A pointer-size controlled load instruction can use a base register. In embodiments, the one or more pointer-size controlled load instructions can include at least two pointer-size controlled load instructions that identify a base register in different manners. A base register can be used for various purposes including monitoring offsets to data, local variables, and so on, as a function, subroutine, procedure, instruction, etc., is being executed. The base register can be used for a stack, a stack frame, etc. The base register can be identified using various techniques. In embodiments, the at least two pointer-size controlled load instructions that identify the base register in different manners can include a pointer-size controlled load instruction that explicitly identifies the base register and a pointer-size controlled load instruction that implicitly identifies the base register. Explicit identification of the base register can include using or configuring an identifier for the base register within an instruction. Implicit identification of the base register can include using an instruction that employs the base register, thus removing the need to identify the register within the instruction. By implicitly identifying the base register, the bits within the instruction, that would otherwise have been used to identify the base register, can be used for other purposes such as more address bits. In other embodiments, the one or more pointer-size controlled load instructions can include at least two pointer-size controlled load instructions that identify a base register using a different number of bits. The different number of bits may result from explicitly identifying the base register (e.g. more bits), implicitly identifying the base register (e.g. fewer bits), etc.

In other embodiments, the one or more pointer-size controlled memory access instructions include one or more pointer-size controlled store instructions. As discussed throughout, the store instructions can be used for storing data in local memory, DMA memory, external memory in communication with the execution unit, and so on. As for the load instructions, the one or more pointer-size controlled store instructions can include at least two pointer-size controlled store instructions that identify a base register in different manners. In further embodiments, the at least two pointer-size controlled store instructions that identify the base register in different manners can include a pointer-size controlled store instruction that explicitly identifies the base register and a pointer-size controlled store instruction that implicitly identifies the base register. The explicit identification can include bits within the store instruction that identify the base register, while the implicit identification can include using an instruction that uses the base register. As for the load instructions, the one or more pointer-size controlled store instructions can include at least two pointer-size controlled store instructions that identify a base register using a different number of bits.

For the data transfer instructions to be executed properly, the amount of data to be transferred must be indicated. As discussed below, a pointer-size controlled memory access instruction can include a unit, where the unit can include a byte, a word, a double word, and so on. The flow 2200 includes dynamically determining a size of data 2230 to be accessed based on a current pointer size. The size of data to be accessed can include a number of bytes of data, words of data, double words of data, and the like. In embodiments, dynamically determining the size of data to be accessed based on the current pointer size can include dynamically determining the size of data to be accessed to be a first size 2232 when the current pointer size is the first size. The dynamically determining the size of data to be accessed can include dynamically determining the size of data to be accessed to be a second size 2234 when the current pointer size is the second size. In embodiments, dynamically determining that the size of data to be accessed based on a first size pointer can be based on bytes or words; dynamically determining that the size of data to be accessed based on the second size pointer can be based on words or double words; and so on. In further embodiments, the one or more instructions of the second length can include a corresponding memory access instruction 2236 to one of the one or more pointer-size controlled memory access instructions. The memory access instruction can include a load, a store, etc. The corresponding memory access instruction can identify that data of the first size is to be accessed 2238, where data of the first size can be based on bytes, words, etc.

The flow 2200 includes outputting one or more control signals 2240. The one or more control signals that can be output can include configuration bits, flags, select signals, enable signals, or other signals that can be used for controlling memory access. The one or more control signals can be output based on other determinations such as determining that the received instruction is a memory access instruction, a pointer-size controlled instruction, etc. The received instruction can include other types of instructions such as data manipulation instructions. The one or more instructions of a first size and/or the one or more instructions of the second size can include one or more pointer-size controlled arithmetic instructions. The flow 2200 can further include, in response to determining that the received instruction is one of the one or more pointer-size controlled arithmetic instructions, dynamically determining a size of data upon which an arithmetic operation 2242 is to be performed based on the current pointer size. The dynamically determining the size of the data can include outputting one or more control signals that cause the execution unit to perform an arithmetic operation on data of the determined size. In embodiments, the one or more pointer-size controlled arithmetic instructions can include at least one pointer-size controlled arithmetic instruction where the data upon which the arithmetic operation is performed is a pointer. The arithmetic operation on the pointer can include adding, subtracting, incrementing, decrementing, etc. Embodiments further include, in response to determining that the received instruction is the corresponding memory access instruction, outputting one or more control signals that cause the execution unit to perform a memory access of the first size of data. The first size of data can include bytes, words, double words, and so on.

The flow 2200 includes causing the execution unit to perform a memory access 2250 of the dynamically determined size of data. The memory access can access memory within the execution unit, memory coupled to or in communication with the execution unit, etc. The memory can include direct memory access (DMA) memory. In response to determining that the received instruction is one of the one or more pointer-size controlled memory access instructions, the flow 2200 includes dynamically determining a unit of an identified offset 2252 based on the current pointer size. The unit can be a data portion or size unit and can be used to determine how much data is to be sent to memory or received from memory. The memory access can be based on the unit of the identified offset. The data sending to memory or data receiving from memory can result from executing a memory access instruction. The unit that is determined can include bytes, words, double words, and the like. The one or more control signals can cause the execution unit to perform the memory access of the dynamically determined size of data based on the identified offset in the determined unit. The unit of the identified offset can reference a number of bytes, words, double words, and so on. In embodiments, the one or more control signals can cause the execution unit to perform the memory access of the dynamically determined size of data based on the identified offset in the determined unit. In embodiments, the determined unit is one of words and double words. Various steps in the flow 2200 may be changed in order, repeated, omitted, or the like without departing from the disclosed concepts.

FIG. 23 is a flow diagram for mode switching instructions. Processor instruction manipulation can include mode switching instructions for pointer-size controlled instruction processing. The mode switching instructions can be executed by a processor. The flow 2300 includes receiving an instruction for execution 2310 by an execution unit of a processor. The received instruction can include an instruction from a set that includes mode switching instructions, where the mode switching instructions enable support of a range of memory addressing modes. The processor on which the instructions are executed includes a modeless instruction set architecture (ISA) processor. In embodiments, the modeless architecture can enable different addressing for code-density oriented instructions only. The instruction can include a data transfer instruction, data manipulation instruction, program or code control instruction, and so on. Instructions such as data transfer instructions typically include loading data and storing data. The loading data and storing data can be based on an address that can indicate a location in memory, a register, etc.

The flow 2300 includes determining that the received instruction 2320 is one of the one or more pointer-size controlled memory access instructions. As discussed throughout, the received instruction can transfer or manipulate data, control the program or code, etc. The instruction can explicitly include information such as bits that define an address space, a register, a pointer register such as a global pointer register, and so on. The instruction can implicitly define an address space, global pointer register, etc. Use of an implicitly defined address space, global pointer register, and the like, can enable repurposing of bits within a pointer-size controlled memory access instruction for such uses as additional address bits. The implicit instruction can also be shorter than an explicit instruction. Various addressing techniques can be used for accessing memory for loading and storing data. In embodiments, the code-density oriented instructions can be enabled for N-bit addressing and 2 N-bit addressing 2322, based on an implicitly defined address space, and in embodiments, N is equal to 32 and 2N is equal to 64. The received instruction can include a variable length instruction. The received instruction can be an instruction from a variable length instruction set that includes one or more instructions of a first length and one or more instructions of a second length. The first length can be shorter than the second length, and the one or more instructions of the first length can include one or more pointer-size controlled memory access instructions.

The flow 2300 can include executing the instruction on a processor 2330. The processor can include a processor of a plurality of processors that can execute instructions independently, in parallel, and so on. The processor can be based on an ISA, where the instruction set can include a complex instruction set, a reduced instruction set, etc. In embodiments, the instruction can be executed on a processor, where the processor is a reduced instruction set computer 2332 (RISC). In embodiments, the processor can include a processing element within a reconfigurable fabric. When the received instruction is determined to be a pointer-size controlled memory access instruction, the determining can include determining dynamically a size of data to be accessed based on a current pointer size. The size of data can be determined based on units, where the units can include bytes, fractions of words, words, double words, blocks, and the like. The size of data can be further based on address spaces, where the spaces can be addressed based on the number of bits used for the address. As discussed throughout, the processor can include a modeless architecture 2334. For a modeless architecture, various types of instructions can be included, where types of instructions can enable one or more different addressing 2336 techniques. Further to a modeless architecture, an instruction function can be unmodified between N-bit and 2 N-bit address spaces. Various steps in the flow 2300 may be changed in order, repeated, omitted, or the like without departing from the disclosed concepts.

FIG. 24 is a system diagram for pointer-size controlled instruction processing. The system 2400 can include one or more processors 2410 coupled to a memory 2412 which stores instructions. The system 2400 can include a display 2414 coupled to the one or more processors 2410 for displaying data, intermediate steps, instructions, program counters, instruction counters, control signals, and so on. In embodiments, one or more processors 2410 are attached to the memory 2412 where the one or more processors, when executing the instructions which are stored, are configured to: receive an instruction for execution by an execution unit of the data processing apparatus, the received instruction being an instruction from a variable length instruction set comprising one or more instructions of a first length and one or more instructions of a second length wherein the first length is shorter than the second length, the one or more instructions of the first length comprising one or more pointer-size controlled memory access instructions; and in response to determining that the received instruction is one of the one or more pointer-size controlled memory access instructions, dynamically determine a size of data to be accessed based on a current pointer size, and output one or more control signals that cause the execution unit to perform a memory access of the dynamically determined size of data. The system 2400 can include a collection of instructions and data 2420. The instructions and data 2420 may be stored in a database, one or more statically linked libraries, one or more dynamically linked libraries, precompiled headers, source code, or other suitable formats. The instructions can include instructions for pointer-size controlled instruction processing. The instructions can include instructions for decoding instructions at a data processing apparatus. The processing apparatus may be realized using processing elements within a reconfigurable fabric.

The system 2400 can include a receiving component 2430. The receiving component can include functions and instructions for receiving an instruction for execution by an execution unit of the data processing apparatus. The received instruction can be an instruction from a variable length instruction set comprising one or more instructions of a first length and one or more instructions of a second length, where the first length is shorter than the second length. The one or more instructions of the first length include one or more pointer-size controlled memory access instructions. The instructions with the first length and the instructions with the second length can include instructions for manipulating data, loading data, storing data, analyzing data, comparing data, and so on. Instructions with the first length and instructions with the second length may be different in length based on addressing such as immediate or indirect addressing, numbers of operands within the instructions, and so on.

The system 2400 can include a determining component 2440. The determining component 2440 can include functions and instructions for determining that the received instruction is one of the one or more pointer-size controlled memory access instructions. The determining component can determine whether the received instruction is a pointer-size controlled memory access instruction or another type of instruction based on an operation code or “opcode”, flag bits, status bits, or control bits within the instruction, on instruction length, and so on. The determining component can include functions and instructions for dynamically determining a size of data to be accessed based on a current pointer size. The size of data to be accessed based on the current pointer size can include bytes, words, fractions of words, blocks, files, registers, and so on. The system 2400 can include an outputting component 2450. The outputting component can include functions and instructions for outputting one or more control signals that cause the execution unit to perform a memory access of the dynamically determined size of data. The one or more control signals can cause the execution unit to perform a memory access to local memory, memory that can be coupled to or in communication with the execution unit, direct memory access (DMA) memory, etc.

The system 2400 can include a computer program product embodied in a non-transitory computer readable medium for decoding instructions at a data processing apparatus, the computer program product comprising code which causes one or more processors to perform operations of: receiving an instruction for execution by an execution unit of the data processing apparatus, the received instruction being an instruction from a variable length instruction set comprising one or more instructions of a first length and one or more instructions of a second length wherein the first length is shorter than the second length, the one or more instructions of the first length comprising one or more pointer-size controlled memory access instructions; and in response to determining that the received instruction is one of the one or more pointer-size controlled memory access instructions, dynamically determining a size of data to be accessed based on a current pointer size, and outputting one or more control signals that cause the execution unit to perform a memory access of the dynamically determined size of data.

Each of the above methods may be executed on one or more processors on one or more computer systems. Embodiments may include various forms of distributed computing, client/server computing, and cloud-based computing. Further, it will be understood that the depicted steps or boxes contained in this disclosure's flow charts are solely illustrative and explanatory. The steps may be modified, omitted, repeated, or re-ordered without departing from the scope of this disclosure. Further, each step may contain one or more sub-steps. While the foregoing drawings and description set forth functional aspects of the disclosed systems, no particular implementation or arrangement of software and/or hardware should be inferred from these descriptions unless explicitly stated or otherwise clear from the context. All such arrangements of software and/or hardware are intended to fall within the scope of this disclosure.

The block diagrams and flowchart illustrations depict methods, apparatus, systems, and computer program products. The elements and combinations of elements in the block diagrams and flow diagrams, show functions, steps, or groups of steps of the methods, apparatus, systems, computer program products and/or computer-implemented methods. Any and all such functions—generally referred to herein as a “circuit,” “module,” or “system”—may be implemented by computer program instructions, by special-purpose hardware-based computer systems, by combinations of special purpose hardware and computer instructions, by combinations of general purpose hardware and computer instructions, and so on.

A programmable apparatus which executes any of the above-mentioned computer program products or computer-implemented methods may include one or more microprocessors, microcontrollers, embedded microcontrollers, programmable digital signal processors, programmable devices, programmable gate arrays, programmable array logic, memory devices, application specific integrated circuits, or the like. Each may be suitably employed or configured to process computer program instructions, execute computer logic, store computer data, and so on.

It will be understood that a computer may include a computer program product from a computer-readable storage medium and that this medium may be internal or external, removable and replaceable, or fixed. In addition, a computer may include a Basic Input/Output System (BIOS), firmware, an operating system, a database, or the like that may include, interface with, or support the software and hardware described herein.

Embodiments of the present invention are limited to neither conventional computer applications nor the programmable apparatus that run them. To illustrate: the embodiments of the presently claimed invention could include an optical computer, quantum computer, analog computer, or the like. A computer program may be loaded onto a computer to produce a particular machine that may perform any and all of the depicted functions. This particular machine provides a means for carrying out any and all of the depicted functions.

Any combination of one or more computer readable media may be utilized including but not limited to: a non-transitory computer readable medium for storage; an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor computer readable storage medium or any suitable combination of the foregoing; a portable computer diskette; a hard disk; a random access memory (RAM); a read-only memory (ROM), an erasable programmable read-only memory (EPROM, Flash, MRAM, FeRAM, or phase change memory); an optical fiber; a portable compact disc; an optical storage device; a magnetic storage device; or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.

It will be appreciated that computer program instructions may include computer executable code. A variety of languages for expressing computer program instructions may include without limitation C, C++, Java, JavaScript™, ActionScript™, assembly language, Lisp, Perl, Tcl, Python, Ruby, hardware description languages, database programming languages, functional programming languages, imperative programming languages, and so on. In embodiments, computer program instructions may be stored, compiled, or interpreted to run on a computer, a programmable data processing apparatus, a heterogeneous combination of processors or processor architectures, and so on. Without limitation, embodiments of the present invention may take the form of web-based computer software, which includes client/server software, software-as-a-service, peer-to-peer software, or the like.

In embodiments, a computer may enable execution of computer program instructions including multiple programs or threads. The multiple programs or threads may be processed approximately simultaneously to enhance utilization of the processor and to facilitate substantially simultaneous functions. By way of implementation, any and all methods, program codes, program instructions, and the like described herein may be implemented in one or more threads which may in turn spawn other threads, which may themselves have priorities associated with them. In some embodiments, a computer may process these threads based on priority or other order.

Unless explicitly stated or otherwise clear from the context, the verbs “execute” and “process” may be used interchangeably to indicate execute, process, interpret, compile, assemble, link, load, or a combination of the foregoing. Therefore, embodiments that execute or process computer program instructions, computer-executable code, or the like may act upon the instructions or code in any and all of the ways described. Further, the method steps shown are intended to include any suitable method of causing one or more parties or entities to perform the steps. The parties performing a step, or portion of a step, need not be located within a particular geographic location or country boundary. For instance, if an entity located within the United States causes a method step, or portion thereof, to be performed outside of the United States then the method is considered to be performed in the United States by virtue of the causal entity.

While the invention has been disclosed in connection with preferred embodiments shown and described in detail, various modifications and improvements thereon will become apparent to those skilled in the art. Accordingly, the foregoing examples should not limit the spirit and scope of the present invention; rather it should be understood in the broadest sense allowable by law. 

What is claimed is:
 1. A method for processor instruction manipulation comprising: receiving an instruction for execution by an execution unit of a processor, the received instruction being an instruction from a variable length instruction set comprising one or more instructions of a first length and one or more instructions of a second length, wherein the first length is shorter than the second length and the one or more instructions of the first length comprising one or more pointer-size controlled memory access instructions; determining that the received instruction is one of the one or more pointer-size controlled memory access instructions; determining dynamically a size of data to be accessed based on a current pointer size; and outputting one or more control signals that cause the execution unit to perform a memory access of the dynamically determined size of data.
 2. The method of claim 1 wherein dynamically determining the size of data to be accessed based on the current pointer size comprises dynamically determining the size of data to be accessed to be a first size when the current pointer size is the first size and dynamically determining the size of data to be accessed to be a second size when the current pointer size is the second size.
 3. The method of claim 2 wherein the one or more instructions of the second length comprise a corresponding memory access instruction to one of the one or more pointer-size controlled memory access instructions, the corresponding memory access instruction identifying that data of the first size is to be accessed.
 4. The method of claim 3 further comprising determining that the received instruction is the corresponding memory access instruction.
 5. The method of claim 4 further comprising outputting one or more control signals that cause the execution unit to perform a memory access of the first size of data.
 6. The method of claim 2 wherein the one or more instructions of a first size or the one or more instructions of the second size comprise one or more pointer-size controlled arithmetic instructions, and wherein the method further comprises, in response to determining that the received instruction is one of the one or more pointer-size controlled arithmetic instructions, dynamically determining a size of data upon which an arithmetic operation is to be performed based on the current pointer size, and outputting one or more control signals that cause the execution unit to perform an arithmetic operation on data of the determined size.
 7. The method of claim 1 further comprising dynamically determining a unit of an identified offset based on the current pointer size, based on determining that the received instruction is one of the one or more pointer-size controlled memory access instructions.
 8. The method of claim 7 wherein the one or more control signals cause the execution unit to perform the memory access of the dynamically determined size of data based on the identified offset in the determined unit.
 9. (canceled)
 10. The method of claim 1 wherein the one or more pointer-size controlled memory access instructions comprise one or more pointer-size controlled load instructions.
 11. The method of claim 10 wherein the one or more pointer-size controlled load instructions comprise at least two pointer-size controlled load instructions that identify a base register in different manners.
 12. The method of claim 11 wherein the at least two pointer-size controlled load instructions that identify the base register in different manners comprise a pointer-size controlled load instruction that explicitly identifies the base register and a pointer-size controlled load instruction that implicitly identifies the base register.
 13. The method of claim 10 wherein the one or more pointer-size controlled load instructions comprise at least two pointer-size controlled load instructions that identify a base register using a different number of bits.
 14. The method of claim 1 wherein the one or more pointer-size controlled memory access instructions comprise one or more pointer-size controlled store instructions.
 15. The method of claim 14 wherein the one or more pointer-size controlled store instructions comprise at least two pointer-size controlled store instructions that identify a base register in different manners.
 16. The method of claim 15 wherein the at least two pointer-size controlled store instructions that identify the base register in different manners comprise a pointer-size controlled store instruction that explicitly identifies the base register and a pointer-size controlled store instruction that implicitly identifies the base register.
 17. The method of claim 14 wherein the one or more pointer-size controlled store instructions comprise at least two pointer-size controlled store instructions that identify a base register using a different number of bits.
 18. The method of claim 1 wherein the one or more pointer-size controlled memory access instructions comprise one or more pointer-size controlled arithmetic instructions.
 19. The method of claim 18 wherein the one or more pointer-size controlled arithmetic instructions comprise at least one pointer-size controlled arithmetic instruction wherein the data upon which an arithmetic operation associated with the arithmetic instruction is performed is a pointer.
 20. The method of claim 19 wherein the pointer is one of a global data pointer, a stack pointer, and a program counter.
 21. (canceled)
 22. The method of claim 1 wherein the processor comprises a modeless architecture.
 23. The method of claim 22 wherein an instruction function is unmodified between N-bit and 2 N-bit address spaces.
 24. The method of claim 22 wherein the modeless architecture enables different addressing for code-density oriented instructions only.
 25. The method of claim 24 wherein the code-density oriented instructions are enable for N-bit addressing and 2 N-bit addressing, based on an implicitly defined address space.
 26. A computer program product embodied in a non-transitory computer readable medium for processor instruction manipulation, the computer program product comprising code which causes one or more processors to perform operations of: receiving an instruction for execution by an execution unit of a processor, the received instruction being an instruction from a variable length instruction set comprising one or more instructions of a first length and one or more instructions of a second length, wherein the first length is shorter than the second length and the one or more instructions of the first length comprising one or more pointer-size controlled memory access instructions; determining that the received instruction is one of the one or more pointer-size controlled memory access instructions; determining dynamically a size of data to be accessed based on a current pointer size; and outputting one or more control signals that cause the execution unit to perform a memory access of the dynamically determined size of data.
 27. A computer system for processor instruction manipulation comprising: a memory which stores instructions; one or more processors attached to the memory wherein the one or more processors, when executing the instructions which are stored, are configured to: receive an instruction for execution by an execution unit of a processor, the received instruction being an instruction from a variable length instruction set comprising one or more instructions of a first length and one or more instructions of a second length, wherein the first length is shorter than the second length and the one or more instructions of the first length comprising one or more pointer-size controlled memory access instructions; determine that the received instruction is one of the one or more pointer-size controlled memory access instructions; determine dynamically a size of data to be accessed based on a current pointer size; and output one or more control signals that cause the execution unit to perform a memory access of the dynamically determined size of data. 